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Abstract 

Face recognition has been studied extensively for more than 20 years 
now. Since the beginning of 90s the subject has became a major issue. This 
technology is used in many important real- world applications, such as video 
surveillance, smart cards, database security, internet and intranet access. 

This report reviews recent two algorithms for face recognition which take 
advantage of a relatively new multiscale geometric analysis tool - Curvelet 
transform, for facial processing and feature extraction. This transform proves 
to be efficient especially due to its good ability to detect curves and lines, 
which characterize the human's face. 

An algorithm which is based on the two algorithms mentioned above 
is proposed, and its performance is evaluated on three data bases of faces: 
AT&T (ORL), Essex Grimace and Georgia- Tech. k-nearest neighbour (k- 
NN) and Support vector machine (SVM) classifiers are used, along with 
Principal Component Analysis (PGA) for dimensionality reduction. 

This algorithm shows good results, and it even outperforms other algo- 
rithms in some cases. 
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1 Introduction 

Over the last ten years or so, face recognition has become a popular area of 
research in computer vision and one of the most successful applications of 
image analysis and understanding. Because of the nature of the problem, not 
only computer science researchers are interested in it, but neuroscientists and 
psychologists also. It is the general opinion that advances in computer vision 
research will provide useful insights to neuroscientists and psychologists into 
how human brain works, and vice versa. 

General face recognition algorithms include three key steps: (1) face de- 
tection and normalization; (2) feature extraction; (3) identification or verifi- 
cation. General recognition process is depicted in Figure 1. In this project 
we will focus on the last two steps. 

Facial feature extraction is crucial to face recognition and facial expression 
recognition. It is clear that an appropriate choice of the representative feature 
has a crucial eflFect on the the performance of recognition algorithm. 




Figure 1: General face recognition system 



Studies in human visual system and image statistics show that an ideal 
image representation or a feature extraction method should satisfy the fol- 
lowing five conditions [1]: multiresolution, localization, critical sampling, 
directionality and anisotropy. 

In recent years, many outstanding algorithms have been proposed for fea- 
ture extraction. Wavelet analysis (using the well-known wavelet transform) 
is a significant feature extraction tool because of its ability of localization in 
both time domain and frequency domain, which can help us in focusing on 
specific parts of a given image. An example is depicted in Figure 2. 
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Figure 2: Example of the 2D discrete wavelet transform, from [2] 

Wavelet transform is also used in image compression algorithms, such as 
JPEG-2000. However, wavelet transform can only capture point singularities 
in an image, rather than curves and lines which appear in the human's face. 
Moreover, this transform does not satisfy the last two conditions above. 

A more suitable transform for the task of face recognition is the Curvelet 
transform. It is a relatively new mutliscale analysis tool, which was proposed 
in 1999 [3] and revised in 2006 [4]. Curvelets provide optimally sparse repre- 
sentations of objects which display smoothness except for discontinuity along 
a general curve with bounded curvature. 



In this project, two algorithms [5, 6] which use this transform are de- 
scribed and discussed. Both of them use the transform coefficients which 
are extracted using the curvelet transform, but the second one also employs 
Principal Component Analysis (PCA) in order to reduce dimensionality. This 
reduction plays an important role in the process, since a typical 100 x 100 
pixels image can have thousands of coefficients. 

An algorithm which integrates both algorithms above with some improve- 
ments is implemented, and its performance are evaluated. It is shown that 
it produces similar results and in some cases even better results than the 
algorithms mentioned above. 

This report is organized as follows. First, in section 2 we review the 
Curvelet transform. Later, we describe and discuss the algorithms in 3. The 
algorithm which is implemented is described in sections 4 and 5, with ap- 
propriate performance evaluation, and it is also compared to both algorithm 
above. Information about the attached Matlab code is given in 6. Finally, 
conclusions are given in section 7. 

Descriptions of k-NN and SVM are presented in Appendices A and B. 



2 Curvelet transform 

Curvelet transform is a kind of multi-resolution analysis tool. Its main advan- 
tage is the ability to use relatively small number of coefficients to reconstruct 
edge details at an image. Each matrix of coefficients is characterized by both 
an angle and scale. That is, 
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where (j) is the basis functions, j and / are scale and angle, and (m, n) 
is an index, which is limited according to j and /. Illustration of scale and 
angle are depicted in figure 3. 
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Figure 3: Curvelet tiling of space and frequency. The figure on the left 
represents the induced tiling of the frequency plane The figure on the right 
schematically represents the spatial Cartesian grid associated with a given 
scale and orientation 

Full description of the curvelet transform and its digital implementations 
is given in [4] . We will review here one of the implementations - Curvelets via 
Wrapping, which is faster than the Curvelets via USFFT (Unequally Spaced 
Fast Fourier Transform) implementation. The transform can be implemented 
as follows: 



1. 2D-FFT 2D FFT (Fast Fourier Transform) is applied to obtain Fourier 
samples: 

/ [ni, 712] , -n/2 < ni, n2 < n/2 

2. Interpolation For each scale- angle pair (j, /) interpolate (or resample) 
/ [^1, ^2] to obtain / [ni, 712 — rii tan^/] (as depicted in figure 3). 



3. Localization Multiply the interpolated function / with a window func- 
tion Uj [ni,n2] effectively localizing / near the parallelogram with ori- 
entation 01^ to obtain 

fj,i K, ^2] = f[ni,n2- ni tan6>/] Uj [ni, ^2] 

4. Inverse 2D-FFT Apply the inverse 2D FFT to each fj^i to obtain the 
associated Curvelets - C^^. It should be noted that the values can be 
complex. Usually, we will use their norm. 

Some curvelets of a circular object can be seen in Figure 4. 




Figure 4: A few curvelets (their norms are presented) 



Efficient numerical algorithms exist for computing the curvlet transform 
of discrete data. The computational cost of a curvlet transform is ap- 
proximately 10-20 times that of an FFT, and has the same dependence of 
0{n^log{n)) for an image of size n x n. 



3 Known algorithms 

3.1 Face Recognition by Curvelet Based Feature Ex- 
traction 

This algorithm [5] uses three different versions of the same image - 8 bit 
(original), 4 bit and 2 bit. The last two version are obtained by quantization 
of the original image. This is illustrated in Figure 5. The idea is to find 
the prominent edges, which will stay even in the quantized image. It uses 
3 classifiers whose inputs (features) are the curvelet coefl&cients of the three 
gray scale representations of the same image which are mentioned above. 





Figure 5: Original image (left) and its quantized versions 

Since there are curvelet s for each scale and angle, there is some freedom of 
the choice of the coefficients (unless all the curvelets are taken into account, 
which is not the case). However, it is not clear from the description of this 
algorithm how they should be chosen. 

Each testing sample undergoes the same procedure (including quantiza- 
tion), and according to majority vote of the 3 classifiers, its class (person) 
is decided. If there is no "winner", the image is declared as "rejected". The 
classifiers which are used are SVMs (Appendix B). 

This method has two main drawbacks. First, it is computationally ex- 
pensive, especially when dealing with large databases, since the transform 
is done 3 times (on 3 different versions of the same image). Moreover, the 
assumption that the prominent edges stay even after quantization can be 
sometimes misleading; actually, some additional contours can be created in 
the quantization process. 

3.2 Face recognition using curvelet based PCA 

This algorithm [6] works in a similar way to the previous one (3.1), but 
it does not use the quantized version of the image, and it works directly 
on the unquantized image. This algorithm introduces significant complexity 



reduction by the use of PC A. It was shown in this paper that even 50 principal 
components provide good performance. Moreover, this algorithm provides 
better results than algorithms which use wavelet transform. 

In this algorithm, the classification process is done using the k-NN algo- 
rithm, with k = 1. The coefficients are chosen from one scale (in this paper 
the scale was chosen to be 3), where all the coefficients which associated with 
this scale are used (they are inserted to a one long row vector). 



4 The approach in this project 



The approach taken in this project tries to integrate both former algorithms. 
First, it is clear that the complexity should be tolerable, so transforming 
different versions of the same image should not be used. Moreover, instead 
of using the curvelets in only one scale, we can use different classifier for each 
scale (it should be clear that this still requires only one transform per image), 
and decide by majority vote on the class of each image. This procedure is 
depicted in Figure 6. Similar idea appears in [7]. 



Scale=l 




Decision 



Figure 6: Algorithm - General scheme 

The classifiers which are used are k-NN and SVM (Support Vector Ma- 
chines). Description of these classification algorithms is given in the appen- 
dices A, B. We should also evaluate performance when only small number of 
pictures is available for training (this is the situation in most scenarios). 

It has been shown in [8] that the recognition accuracy of face images does 
not degrade significantly if the size of the image is reduced (especially for 
high resolution pictures). Hence, pictures from AT&T database (which are 
640 X 480) were reduced by six times before additional processing. After 
this operation, their size is similar to images from the other two databases. 
Moreover, color pictures will be converted to gray scale images. 

The transform involves 8 and 16 angles for scales 2 and 3, respectively, 
where for scale 1 and 4 no orientation is taken into account (i.e., angle=0). 
The feature vector per each scale is created by concatenating of all the values 
in the same scale. Since these values can be complex, we will use only their 
2-norm. A face and some of its curvelets are shown in Figure 7. 

It should be noted that SVM classification requires a special treatment, 
since the basic type of SVM can distinguish between only 2 classes. Hence, 
we will use One- Against- All (OAA) SVM, i.e., each picture will be classified 
as belongs to a specific class or not (with no additional information about 
other classes). More information is given in Appendix B. 
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Figure 7: The first image is the original image. The left most image in the 
second row is the approximated image. The rest are curvelets in 8 different 
angles. 

Despite the simplicity of k-NN, it provided better results than SVM OAA, 
so the results which are brought in the next section were obtained by k-NN. 
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5 Results 

In the following parts the results of the approach taken in this project are 
introduced. They are given for different number of samples in each training 
set. These results are quite similar for PCA components in the range 50-100 
so 100 components are used. 

The use of PCA reduces the complexity of the classification part by a 
few magnitudes. Even small number of coefficients (say, 100 instead of thou- 
sands), proves to be highly efficient. However, the part of PCA decomposition 
can be computationally demanding. Hence, only the first 15 sets of faces were 
used in each database. 

Because of the long time it takes to calculate the PCA decomposition, 
this process was done in advance (offline), using the Signal and Image Pro- 
cessing Lab (SIPL) Matlab server. It should be clear that when the principal 
components for the training sets are known, applying the same process for 
the testing tests is easier. 

It should also be noted that the k-NN algorithm is much less complex 
than SVM. It also shows better performance, so its results are presented 
here. Moreover, when solving the numerical optimization problem of SVM, 
convergence is not necessarily attained. In the attached Matlab file, the 
results for both methods can be obtained easily. 

5.1 AT&T (ORL) database 

AT&T (ORL) database [9] contains 10 different images (92 x 112) each for 
40 distinct subjects. Images in this database were taken at different times 
varying the lighting, facial expression and facial details (glasses/no glasses). 
All the images were taken against a dark homogeneous background with 
the subjects in an upright, frontal position (with tolerance for some side 
movement). 

The pictures were taken between April 1992 and April 1994, and they are 
8-bit grayscale in PGM format. Sample images of this dataset are shown in 
figure 8. 

In Figure 9 below, the performance of the algorithm for the AT&T (ORL) 
database is given. It can be seen that good results are obtained. It should 
be noted that some anomaly is seen (the recognition rate for training size of 
6 should be higher than for training size of 5), but it is reasonable to assume 
that averaging on many sets would 'smooth' this. This averaging could not 
be done due to limited computational resources. 
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Figure 8: Sample faces from AT&T (ORL) 



ATSiT (ORL), PCA Components=100 




6.5 7 

Training size 



Figure 9: Classification results for AT&T (ORL) 

5.2 Essex Grimace database 

Essex Grimace database [10] contains a sequence of 20 images (180 x 200) 
each for 18 individuals consisting of male and female subjects, taken with 
a fixed camera. During the sequence, the subject moves his/her head and 
makes grimaces which get more extreme towards the end of the sequence. 
Images are taken against a plain background, with very little variation in 
illumination. Sample images of this database are shown in figure 10. 

In Figure 11 below, the performance of the algorithm for the Grimace 
database is given. 

It can be seen that even when less than half of the images are used 
as training set, the results are very good. This can be related to the fact 
that each picture in this database includes the whole face. Moreover, the 
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Figure 10: Sample faces from Essex Grimace 



Grimace, PCA Components=100 
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Training size 



Figure 11: Classification results for Grimace 

background is homogeneous. The algorithm shows good results despite the 
fact that there are diflFerent face expressions. 

5.3 Georgia- Tech database 

Georgia Tech face database [11] contains images of 50 taken during 1999. 
All people in the database are represented by 15 color JPEG images with 
cluttered background taken at resolution 640x480 pixels. The average size of 
the faces in these images is 150x150 pixels. The pictures show frontal and/or 
tilted faces with diflFerent facial expressions, lighting conditions and scale. 
Sample images can be seen in Figure 12. 

In Figure 13 below, the performance of the algorithm for the Georgia- Tech 
database is given. 
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Figure 12: Sample faces from Georgia- Tech 



Georgia-Tech, PCA Components=100 
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Training size 



Figure 13: Classification results for Georgia- Tech 

This database poses a challenge, since the pictures are not focused on 
faces. Moreover, the background is not homogeneous and the shadow of the 
head can been easily in each picture. Despite this issue, using even less than 
half of the pictures as training set provides great results. 

5.4 Comparison to known results 

In [5], which is described in section 3.1, 6 images were used as training set 
for AT&T (ORL), 12 for Grimace and 9 for Georgia- Tech database. The 
(averaged) results of this algorithm are given in Figure 14. 

The results are similar to our algorithm, whereas in the case of Georgia- 
Tech database our algorithm outperforms the algorithm given in [5] . This can 
be explained by the challenging images of this database - the approach in [5] 
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Database 


Recognition Rate 


AT&T (ORL) 


99.4% 


Essex Grimace 


98.1% 


Georgia-Tech 


85.9% 



Figure 14: Classification results from [5] 

produces quantized version of the same image to detect the face's curves, but 
it is not so efficient for this kind of pictures. 

In [6], which is described in section 3.2, 5 images per subject for AT&T 
(ORL) and 8 images per subject for Grimace were used as training set 
(Georgia- Tech database was not used). The results are presented in Fig- 
ure 15. 
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Recognition Rate (%) 
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20 


99.8 
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30 


99.8 


96.6 


50 


100 


96.6 


70 


100 


96.6 


90 


100 


96.6 


110 


100 


96.6 



Figure 15: Classification results from [6] 

Our results are very similar, and in the case of Grimace they are even 
better. 

In all the algorithms, the most difficult images to classify were those in 
which the subject's face is not directed at the camera. Indeed, it is reasonable 
to assume that the curves of the face are easier captured when the full face 
is directed to the camera. 
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6 Matlab code 

This report is accompanied by a Matlab code which can produce the results 
presented before, along with many more. One needs just to define which 
database he wants to use, what is the training set size (per each subject) and 
how many components (of PC A) should be used. 

As described above, the PCA decomposition process was done in advance, 
so its results for 50 to 120 components are saved under the relevant 'mat' 
files. It was also done for each database and for each training set size (5 to 
8), so the results are given very quickly. It also exempt us from the need to 
save all the databases. 

SVM classification is disabled by default, since its performance are worse 
than k-NN. Moreover, it is computationally demanding and the optimization 
problem involved in this classification method does not necessarily converge, 
so the results of this method should be taken with caution. 

You should just run "runme.m" file, after adding the main directory to 
Matlab path. More instructions are given in this file. Some output which 
was produced using this code is shown in Figure 16. 



Command Window i 



ChQQse dfitfitsse: (1} ATiT (ORL) (2} Grimace (3) Georgia-Tech 
2 

Using Grimace dststsse 

PCA coefficients njirter: 50 / €0 / 70 / SO / 90 / 100 / 110 / 120 
80 

Training set size: 3/4/5/^/7/8 
5 

fc-HW Classification. . . 
Databse: Griir.ace 
Training njirter: 5 
PCA coefficients nairJber: 80 
Classifier: KNN 
Recognition rate: 97.7679% 
fx » 



Figure 16: Output results from Matlab - Example 
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7 Conclusions 

In this project we demonstrated the abihty of curevelet transform to assist 
in face recognition. It required learning about this transform and about 
common classification methods, such as k-NN and SVM. The main part of 
this project was the proper use of the curvelet transform coefficients as an 
input to a majority vote classifier, which was constructed according to the 
algorithm which was introduced in this project. 

The initial idea was to implement one of the algorithms which were re- 
viewed in 3, but while looking for related material, the idea to build another 
algorithm which is based on these two appeared to be more challenging. 
Moreover, the main algorithm did not require the use of SVM classifier, and 
building such a classifier was also a significant part of the project, though 
k-NN classifier obtained better results. 

Some implementation issues were considered. First, due to limited mem- 
ory, we could use only the first 15 sets of faces in each dataset. Moreover, 
the PCA process takes a lot of time, so this process was done offline, and the 
PC A coefficients were saved. It should also be noted that SVM classifier deals 
usually with only two classes, so its adaptation to multi-class classification 
was also a significant part of the project. 

For improving the results of the algorithm, it is suggested to add some 
pre-processing steps. The first one should crop the picture so only the face 
would be shown (face detection). This could improve the results especially 
for Georgia- Tech database. For this task, it would be a good idea to use 
also the color information (instead of converting the images to grayscale) 
of the image. For example, we can determine parts of the face according 
to their colors (e.g., lips, cheeks). In addition, it would be advisable to 
examine another version of multi-class SVM, the version where each sample is 
compared against another one. However, this would add high computational 
complexity to the algorithm. 
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A Appendix: k-NN 

The K-nearest-neighbour algorithm [12] is one of the simplest classification 
algorithms. The training examples are vectors in a multidimensional feature 
space, each with a class label. The training phase of the algorithm consists 
only of storing the feature vectors and class labels of the training samples. 

In the classification phase, A; is a user-defined constant, and an unlabelled 
vector (a query or test point) is classified by assigning the label which is most 
frequent among the K training samples nearest to that query point. Many 
metrics can be used for measuring the distance between two points. Such 
metrics are the Euclidian (standard 2-norm) metric and the Gaussian metric 

/ II _ ||2\ 

||x — y|| = exp f — "^ 2 ) with predefined values for a and K. 




Figure 17: Example of K-NN classification. If K=l, the green circle will be 
classified to the same class as the blue triangles 

Obviously, this algorithm is very simple, where only the distances between 
any query point to all the points in the training set need to be calculated. It 
also offers pretty good results despite of its low complexity. 

B Appendix: SVM 

An SVM classifier [13] tries to construct a separating hyper-plane between 
two groups of samples, in a way that the distance between the sets to the 
hyper-plane is maximal. It is demonstrated in Figure 18. 

As implied in the way that SVM classifier works, it separates only two 
groups, so in case there are more than two groups, some generalization is 
needed. One-against-all (OAA) SVMs were first introduced by Vladimir 
Vapnik in 1995. The initial formulation of the one-against-all method re- 
quired unanimity among all SVMs: a data point would be classified under a 
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Figure 18: H3 doesn't separate the 2 classes. HI does, with a smaU margin 
and H2 with the maximum margin. 



certain class if and only if that class's SVM accepted it and all other classes' 
SVMs rejected it. 

OAA method was used in this project. This method is depicted in Figure 
19. 




Figure 19: Diagram of binary OAA region boundaries on a basic problem 
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